-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: rm gossip usage from node/store/cluster ID allocation #56263
Conversation
26cc29e
to
f05cc1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)
pkg/server/node.go, line 561 at r1 (raw file):
// first store ID has already been pre-allocated for us. // // TODO(irfansharif): Is this sound? If we're restarting an already
Sigh, this is unsound on master, #56272. Good thing we didn't officially support multiple stores in 20.2.
#56272 and these TODOs are unrelated to this PR though, I just noticed given I was in the area. I'll work on cleaning them up separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)
pkg/server/init.go, line 181 at r1 (raw file):
// on persisted data instead of this flag. // // [1]: This is not technically true for mixed version clusters where we leave
Hmm, isn't this gone now, too?
pkg/server/init.go, line 375 at r1 (raw file):
if errors.Is(err, ErrIncompatibleBinaryVersion) { // Propagate upwards; these is an error condition the caller knows
nit: this is
pkg/server/node.go, line 389 at r1 (raw file):
// Inform the RPC context of the node ID. n.storeCfg.RPCContext.NodeID.Set(ctx, nodeID)
This is done elsewhere already?
pkg/server/node.go, line 561 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Sigh, this is unsound on master, #56272. Good thing we didn't officially support multiple stores in 20.2.
#56272 and these TODOs are unrelated to this PR though, I just noticed given I was in the area. I'll work on cleaning them up separately.
I mentioned this in #56272, but I'm looking again and still don't see the problem. storeIDAlloc
is the number of storeIDs we still need. We pass that to allocateStoreIDs which gets us that many IDs from the KV layer. startID
is the first ID we got back.
Oh, is the problem that the Join RPC also allocated a NodeID/StoreID for us but that we're not properly persisting it? I just briefly looked at the code and was expecting the initServer
to actually bootstrap the store with the NodeID/ClusterID/StoreID but now I'm not sure where that happens. Oh, it's in this code via firstStoreID? OK, that is all surprising and now I can see why you're concerned. This does need to be cleaned up. I would generally prefer if the auxiliary and first store were treated separately, just to get the code clean. In particular, it looks like we're bootstrapping the first store asynchronously?!
pkg/server/server.go, line 1396 at r1 (raw file):
s.rpcContext.ClusterID.Set(ctx, state.clusterID) s.rpcContext.NodeID.Set(ctx, state.nodeID)
ah, here it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)
pkg/server/node.go, line 561 at r1 (raw file):
Like I mentioned in #56271 (comment), I think our saving grace here around the number of store IDs we allocated was accidental.
In particular, it looks like we're bootstrapping the first store asynchronously?!
Yup, yikes. Here's what I think the new code structure should be.
- In the init code:
- If we're being bootstrapped:
- Only initialize first store, leave remaining stores to start up code
later
- If we're joining an existing cluster:
- Only initialize first store, leave remaining stores to start up code
later
- Later, bootstrapping additional new stores:
- Allocate len(auxiliary engines) store IDs, and bootstrap them
asynchronously.
This should let us avoid threading in the first store ID, and always rely on
the KV increment operation to tell us what the first auxiliary store ID should
be.
Cases to consider:
- [x] Bootstrapping first node with single engine
- [x] Bootstrapping first node with multiple engines
- [x] Joining a bootstrapped cluster, with single engine
- [x] Joining a bootstrapped cluster, with multiple engines
- [x] Restarting a bootstrapped node with no new engines
- [x] Restarting a bootstrapped node with new engines
What do you think? Ignoring the gossip code paths here because I want to remove them anyway.
...when joining an existing cluster. This diff adds some sanity around how we bootstrap stores for nodes when we're informed by the existing cluster what the first store ID should be. Previously we were bootstrapping the first store asynchronously, and that is not what we want. We first observed the implications of doing so in cockroachdb#56263, which attempted to remove the use of gossip in cluster/node ID distribution. There we noticed that our somewhat haphazard structure around initialization of stores could lead to doubly allocating store IDs (cockroachdb#56272). We were inadvertently safeguarded against this, as described in cockroachdb#56271, but this structure is still pretty confusing and needed cleanup. Now the new store initialization structure is the following: ``` - In the init code: - If we're being bootstrapped: - Initialize all the stores - If we're joining an existing cluster: - Only initialize first store, leave remaining stores to start up code later - Later, when initializing additional new stores: - Allocate len(auxiliary engines) store IDs, and initialize them asynchronously. ``` This lets us avoid threading in the first store ID, and always rely on the KV increment operation to tell us what the store ID should be for the first additional store. We update TestAddNewStoresToExistingNodes to test allocation behaviour with more than just two stores. Eventually we could simplify the init code to only initialize the first store when we're bootstrapping (there's a longstanding TODO from Andrei to that effect), but it's not strictly needed. This PR unblocks cockroachdb#56263. Release note: None
56299: server: initialize first store within the init server r=irfansharif a=irfansharif ...when joining an existing cluster. This diff adds some sanity around how we bootstrap stores for nodes when we're informed by the existing cluster what the first store ID should be. Previously we were bootstrapping the first store asynchronously, and that is not what we want. We first observed the implications of doing so in #56263, which attempted to remove the use of gossip in cluster/node ID distribution. There we noticed that our somewhat haphazard structure around initialization of stores could lead to doubly allocating store IDs (#56272). We were inadvertently safeguarded against this, as described in #56271, but this structure is still pretty confusing and needed cleanup. Now the new store initialization structure is the following: ``` - In the init code: - If we're being bootstrapped: - Initialize all the stores - If we're joining an existing cluster: - Only initialize first store, leave remaining stores to start up code later - Later, when initializing additional new stores: - Allocate len(auxiliary engines) store IDs, and initialize them asynchronously. ``` This lets us avoid threading in the first store ID, and always rely on the KV increment operation to tell us what the store ID should be for the first additional store. We update TestAddNewStoresToExistingNodes to test allocation behaviour with more than just two stores. Eventually we could simplify the init code to only initialize the first store when we're bootstrapping (there's a longstanding TODO from Andrei to that effect), but it's not strictly needed. This PR unblocks #56263. Release note: None Co-authored-by: irfan sharif <[email protected]>
...when joining an existing cluster. This diff adds some sanity around how we bootstrap stores for nodes when we're informed by the existing cluster what the first store ID should be. Previously we were bootstrapping the first store asynchronously, and that is not what we want. We first observed the implications of doing so in cockroachdb#56263, which attempted to remove the use of gossip in cluster/node ID distribution. There we noticed that our somewhat haphazard structure around initialization of stores could lead to doubly allocating store IDs (cockroachdb#56272). We were inadvertently safeguarded against this, as described in cockroachdb#56271, but this structure is still pretty confusing and needed cleanup. Now the new store initialization structure is the following: ``` - In the init code: - If we're being bootstrapped: - Initialize all the stores - If we're joining an existing cluster: - Only initialize first store, leave remaining stores to start up code later - Later, when initializing additional new stores: - Allocate len(auxiliary engines) store IDs, and initialize them asynchronously. ``` This lets us avoid threading in the first store ID, and always rely on the KV increment operation to tell us what the store ID should be for the first additional store. We update TestAddNewStoresToExistingNodes to test allocation behaviour with more than just two stores. Eventually we could simplify the init code to only initialize the first store when we're bootstrapping (there's a longstanding TODO from Andrei to that effect), but it's not strictly needed. This PR unblocks cockroachdb#56263. Release note: None
b533630
to
1d97fb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL, rebased atop #56299 and #56302 (first commit here, to be ignored).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)
pkg/server/init.go, line 181 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Hmm, isn't this gone now, too?
Done.
pkg/server/node.go, line 561 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Like I mentioned in #56271 (comment), I think our saving grace here around the number of store IDs we allocated was accidental.
In particular, it looks like we're bootstrapping the first store asynchronously?!
Yup, yikes. Here's what I think the new code structure should be.
- In the init code: - If we're being bootstrapped: - Only initialize first store, leave remaining stores to start up code later - If we're joining an existing cluster: - Only initialize first store, leave remaining stores to start up code later - Later, bootstrapping additional new stores: - Allocate len(auxiliary engines) store IDs, and bootstrap them asynchronously. This should let us avoid threading in the first store ID, and always rely on the KV increment operation to tell us what the first auxiliary store ID should be. Cases to consider: - [x] Bootstrapping first node with single engine - [x] Bootstrapping first node with multiple engines - [x] Joining a bootstrapped cluster, with single engine - [x] Joining a bootstrapped cluster, with multiple engines - [x] Restarting a bootstrapped node with no new engines - [x] Restarting a bootstrapped node with new engines
What do you think? Ignoring the gossip code paths here because I want to remove them anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so satisfying to see this cleaned up.
Reviewed 2 of 8 files at r2, 6 of 6 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)
1d97fb3
to
967361f
Compare
TFTR! bors r+ |
Build failed: |
In 20.2, with cockroachdb#52526, we introduced the join RPC for new nodes in the system to be handed a pre-allocated node ID, a pre-allocated store ID, and be informed by the existing cluster what the cluster ID was. In 20.1 and earlier this functionality was provided by our use of gossip, and by constructing hollowed out servers (without node/store IDs) and allocating an ID at the joining node. This required an awkward dance around needing KV to be up in order to allocate the right IDs, but not having an ID for the server process itself. We retained the deprecated gossip codepaths in 20.2 in order to maintain compatibility with 20.1. Now that 20.2 is cut however, in 21.1 code we can always assume that we're talking to nodes that are at least 20.2, and therefore nodes that are able to use the join RPC correctly. This lets us strip out our usage of gossip for ID allocation, which in turn paves the way for a few other simplifications. We also delete the mixed-version/join-init roachtest, as it was only ever relevant for the 20.1/20.2 cycle. The current structure around joining nodes being allocated/informed of the right IDs has adequate coverage with TestClusterConnectivity. Release note: None
967361f
to
dab8955
Compare
I'm a bit lost with the CI failures here. Looks like it's timing out for some of our docker acceptance tests? See here. But I tried running them on my gceworker:
And they work? I'm a bit lost. @knz, @tbg: any ideas what I'm missing? I took a cursory glance at our acceptance testing harness setup, and nothing stands out to me. And the logs aren't super helpful. I've kicked off another run, but I'm pretty sure it'll fail as well (it has before). |
bors r+ |
Build succeeded: |
🤷♀️ |
very nice piece of work 💯 🎉 |
In 20.2, with #52526, we introduced the join RPC for new nodes in the
system to be handed a pre-allocated node ID, a pre-allocated store ID,
and be informed by the existing cluster what the cluster ID was.
In 20.1 and earlier this functionality was provided by our use of
gossip, and by constructing hollowed out servers (without node/store
IDs) and allocating an ID at the joining node. This required an
awkward dance around needing KV to be up in order to allocate the right
IDs, but not having an ID for the server process itself.
We retained the deprecated gossip codepaths in 20.2 in order to maintain
compatibility with 20.1. Now that 20.2 is cut however, in 21.1 code we
can always assume that we're talking to nodes that are at least 20.2,
and therefore nodes that are able to use the join RPC correctly. This
lets us strip out our usage of gossip for ID allocation, which in turn
paves the way for a few other simplifications.
We also delete the mixed-version/join-init roachtest, as it was only
ever relevant for the 20.1/20.2 cycle. The current structure around
joining nodes being allocated/informed of the right IDs has adequate
coverage with TestClusterConnectivity.
Release note: None